Embracing data abundance: BookTest Dataset for Reading Comprehension

نویسندگان

  • Ondrej Bajgar
  • Rudolf Kadlec
  • Jan Kleindienst
چکیده

There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Children’s Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Embracing Data Abundance

There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and is offering the BookTest dataset as a step in that direction.

متن کامل

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

Machine Reading Comprehension (MRC) has become enormously popular recently and has attracted a lot of attentions. However, existing reading comprehension datasets are mostly in English. To add diversity in reading comprehension datasets, in this paper we propose a new Chinese reading comprehension dataset for accelerating related research in the community. The proposed dataset contains two diff...

متن کامل

Psycholinguistic Ambiance of Short Stories in Enhancing Students’ Reading Comprehension and Vocabulary Power

Abstract The present study was carried out to investigate the effect of short stories on students’ reading comprehension, vocabulary power and attitude towards the skill and the new instructional materials. The participants of the study were 120 grade 9 students of Dilla Secondary and preparatory school. In order to gather data for the study, pre- and posttest of reading comprehension, pre and ...

متن کامل

Postgraduate English Students’ Metacognitive Awareness of Reading Strategies and Their Reading Comprehension: A Comparative Study

A fundamental necessity at postgraduate level is a developed strategic reading skill that permits digesting tremendous amounts of technical academic content. The need is more paramount for EFL contexts and postgraduate students majoring in English Language Teaching (ELT) and English Literature (EL) most of whom will ultimately search a career in teaching.  The aim of the present ex-post facto s...

متن کامل

Iranian EFL Learners L2 Reading Comprehension: The Effect of Online Annotations via Interactive White Boards

This study explores the effect of online annotations via Interactive White Boards (IWBs) on reading comprehension of Iranian EFL learners. To this aim, 60 students from a language institute were selected as homogeneous based on their performance on Oxford Placement Test (2014).Then, they were randomly assigned to 3 experimental groups of 20, and subsequently exposed to the research treatment af...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1610.00956  شماره 

صفحات  -

تاریخ انتشار 2016